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A variety of metrics have been proposed to measure the relative importance of nodes in a net- 
work. One of these, a-centrality [T], measures the number of attenuated paths that exist between 
nodes. We introduce a normalized version of this metric and use it to study network structure, 
specifically, to rank nodes and find community structure of the network. Specifically, we extend the 
modularity-maximization method [5] for community detection to use this metric as the measure of 
node connectivity. Normalized a-centrality is a powerful tool for network analysis, since it contains 
a tunable parameter that sets the length scale of interactions. By studying how rankings and dis- 
covered communities change when this parameter is varied allows us to identify locally and globally 
important nodes and structures. We apply the proposed method to several benchmark networks 
and show that it leads to better insight into network structure than alternative methods. 
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I. INTRODUCTION 

Centrality measures the degree to which network struc- 
ture contributes to the importance, or status, of a node in 
a network. Over the years many different centrality met- 
rics have been defined. One of the more popular metrics, 
betweenness centrality [3], measures the fraction of all 
shortest paths in a network that pass through a given 
node. Other centrality metrics include those based on 
random walks [2H7] and path-based metrics. The sim- 
plest path-based metric, degree centrality, measures the 
number of edges that connect a node to others in a net- 
work. According to this measure, the most important 
nodes are those that have the most connections. How- 
ever, a node's centrality depends not only on how many 
others it is connected to but also on the centralities of 
those nodes [TJ [5] . This measure is captured by the total 
number of paths linking a node to other nodes in a net- 
work. One such metric, a-centrality [HH], measures the 
total number of paths from a node, exponentially attenu- 
ated by their length. The attenuation parameter sets the 
length scale of interactions. Unlike other centrality met- 
rics, which do not distinguish between local and global 
structure, a parameterized centrality metric can differ- 
entiate between locally connected nodes, i.e., nodes that 
are linked to other nodes which are themselves intercon- 
nected, and globally connected nodes that link and me- 
diate communication between poorly connected groups 
of nodes. Studies of human [TUrfT^j and animal [T3J pop- 
ulations suggest that such 'bridges' or 'brokers' play a 
crucial role in the information flow and cohesiveness of 
the entire group. 

One difficulty in applying a-centrality in network anal- 
ysis is that its key parameter is bounded by the spectrum 
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of the corresponding adjacency matrix of the network. 
As a result, the metric diverges for larger values of this 
parameter. We address this problem by defining normal- 
ized a-centrality. We show that the new metric avoids 
the problem of bounded parameters while retaining the 
desirable characteristics of a-centrality, namely its ability 
to differentiate between local and global structures. 

In addition to ranking nodes, parameterized central- 
ity can be used to identify communities within a net- 
work [14] . In this paper, we generalize modularity 
maximization-based approach 15, 16] to use normalized 
a-centrality. Rather than find regions of the network 
that have greater than expected number of edges con- 
necting nodes [2J, our approach looks for regions that 
have greater than expected number of weighted paths 
connecting nodes. One advantage of this method is that 
the attenuation parameter can be varied to identify local 
vs. global communities. 

Normalized a-centrality is a powerful tool for network 
analysis. By differentiating between locally and globally 
connected nodes, it provides a simple alternative to pre- 
vious attempts to quantify fine-grained structure of com- 
plex networks, such as the motif-based [17l [18] and role- 
based [IH1 [20] descriptions. The former measures the rel- 
ative abundance of subgraphs of a certain type, while lat- 
ter classifies nodes according to their connectivity within 
and outside of their community. Applying either of these 
descriptions to real networks is computationally expen- 
sive: role-based analysis, for example, requires the net- 
work to be decomposed into distinct communities first. 
Normalized a-centrality, on the other hand, measures 
node connectivity at different length scales, allowing us to 
resolve network structure in a computationally efficient 
manner. 

We use normalized a-centrality to study the structure 
of several benchmark networks, as well as a real-world 
online social network. We show that this parameterized 
centrality metric can identify locally and globally impor- 
tant nodes and communities, leading to a more nuanced 
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understanding of network structure. 



II. CENTRALITY AND NETWORK 
STRUCTURE 

Bonacich JHH] defined a-centrality Cij(a, j3,n) as the 
total number of attenuated paths between nodes i and 
j, with /? and a giving the attenuation factors [57] along 
direct edges (from i) and indirect edges (from interme- 
diate nodes) in the path from i to j, respectively, and n 
is the length of the longest path. Given the adjacency 
matrix of the network A, a-centrality matrix is defined 
as follows: 



C(a, P, n)=pA + f3 ai A 2 + ■ ■ ■ + f[ a k A 



n+l 



(1) 



fc=l 



The first term gives the number of paths of length one 
(edges) from i to j, the second gives the number of paths 
of length two, etc. Although at along different edges in a 
path could in principle be different, for simplicity, we take 
them all to be equal: a.}. — a, Vfc. In this case, the series 
converges to C(a,f3,n — > oo) = /3A(I — aA) , which 
holds while a < 1/Ai, where Ai is the largest character- 
istic root of A [ST]. The computation of Ai is difficult, 
especially for large networks, which include most complex 
real- world networks. 

To get around this difficulty, we define normalized a- 
centrality matrix as: 



NC(a,/3,n-> oo) = 



C(a, /?, n 



C i:j (a,P,n^>- oo) 



(2) 



As we show in the appendix, in contrast to a-centrality, 
normalized a-centrality is not bounded by Ai. Also, we 
prove that, assuming |Ai| is strictly greater than any 
other eigenvalue, lim a _j.x/|Ai| NC{a, j3, n — > oo) exists; 
and as a is increased, NC(a, j3,n — > oo) converges to 
this value and is finite for a < 1. 

Just like the original a-centrality, normalized a- 
centrality contains a tunable parameter a that sets the 
length scale of interactions. For a = 0, (normalized) 
a-centrality takes into account direct edges only. As 
a increases, NC(a, /3,n — > oo) becomes a more global 
measure, taking into account ever larger network com- 
ponents. The expected length of a path, the radius of 
centrality, is (1 — a) -1 . 



A. Node Ranking 

Much of the analysis done by social scientists con- 
sidered local structure, i.e., the number [35] and na- 
ture [lOj [TTJ [26] of an individual's ties. By focusing on 
local structure, however, traditional theories fail to take 
into account the macroscopic structure of the network. 
Many metrics proposed and studied over the years deal 



with this shortcoming, including PageRank [5J and ran- 
dom walk centrality [Jj. These metrics aim to identify 
nodes that are 'close' in some sense to other nodes in the 
network, and are therefore, more important. PageRank, 
for example, gives the probability that a random walk 
initiated at node i will reach j, while random- walk cen- 
trality computes the number of times a node i will be 
visited by walks from all pairs of nodes in the network. 

Normalized a-centrality, NCi(a, ft,n — > oo) = 
Y^,j NCij(a, (3, n — > oo), also measures how 'close' node 
i is to other nodes in a network and can be used to rank 
the nodes accordingly. The presence of a tunable param- 
eter turns normalized a-centrality into a powerful tool 
for studying network structure and allows us to seam- 
lessly connect the rankings produced by well-known lo- 
cal and global centrality metrics. For a = 0, normalized 
a-centrality takes into account local interactions that are 
mediated by direct edges only, and therefore, reduces to 
degree centrality. As a increases and longer range interac- 
tions become more important, nodes that are connected 
by longer paths grow in importance. For a < 1/Ai, the 
rankings produced by normalized a-centrality are equiv- 
alent to those produced by a-centrality. Also as shown 
in the Appendix, for symmetric matrices, as a — > l/|Ai|, 
normalized a-centrality converges to eigenvector central- 
ity £Q. The rankings no longer change as a increases 
further, since a has reached some fundamental length 
scale of the network. 



B. Community Detection 

Girvan & Newman [2] proposed modularity as a met- 
ric for evaluating community structure of a network. The 
modularity-optimization class of community detection al- 
gorithms |15L I16L 127) finds a network division that max- 
imizes the modularity, which is defined as Q — (connec- 
tivity within community)- (expected connectivity), where 
connectivity is measured by the density of edges. We 
extend this definition to use normalized a-centrality as 
the measure of network connectivity. According to this 
definition, in the best division of a network, there are 
more weighted paths connecting nodes to others within 
their own community than to nodes in other communi- 
ties. Modularity can, therefore, be written as: 

Q(a) — [NCij(a, n —> oo) — NCij(a, n — > oo)]5(sj, Sj) 



(3) 

Ndj(a,n —> oo) is given by Eq. (pi). Since ft factors 
out of modularity, without loss ofgcncrality we take 
ft = 1. a can be varied from to 1. NCij(a,n — > oo) 
is the expected normalized a-centrality, and Si is the in- 
dex of the community i belongs to, with S(si,Sj) = 1 if 
Si = Sj] otherwise, £(sj, Sj) — 0. We round the values of 
Ndj(a,n — > oo) to the nearest integer. 

To compute NCij(a,n — > oo), we consider a graph, 
referred to as the null model, which has the same num- 
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ber of nodes and edges as the original graph, but in 
which the edges are placed at random. To make the 
derivation below more intuitive, instead of normalized a- 
centrality, we talk of the number of attenuated paths. 
In normalized a-centrality, the number of attenuated 
paths is scaled by a constant, hence the derivation be- 
low holds true. When all the nodes are placed in a 
single group, then axiomatically, Q(a) — 0. Therefore 
Y^ij [NCij (a, n — > oo) — NCij(a,n —> oo)] = 0, and we 

set W = V,, a -> oo) = J2ij NCij(a,n^ oo). 

Therefore, according to the argument above, the to- 
tal number of paths between nodes in the null model 
NCij(a,n — > oo) is equal to the total number of 
paths in the original graph, ^ . iVCy (a, n — > oo). We 
further restrict the choice of null model to one where the 
expected number of paths reaching node j, W!- n , is equal 
to the actual number of paths reaching the correspond- 
ing node in the original graph. W" 1 = J2i NCij(a, n — ► 
oo) = YiNCij(a,n — > oo) . Similarly, we also as- 
sume that in the null model, the expected number of 
paths originating at node i, W° Mt , is equal to the ac- 
tual number of paths originating at the corresponding 
node in the original graph W° ut = J2j NCij(a,n — > 

°°) = Ej NC lJ {a, n ->■ oo). W, W° ut and Wf 1 are then 
rounded to the nearest integers. 

Next, we reduce the original graph G to a new graph 
G that has the same number of nodes as G and total 
number of edges W, such that each edge has weight 1 
and the number of edges between nodes i and j in G' is 
NCij(a, fi,n — > oo). Now the expected number of paths 
between i and j in graph G could be taken as the ex- 
pected number of the edges between nodes i and j in 
graph G' and the actual number of paths between nodes 
i and j in graph G can be taken as the actual number of 
edges between node i and node j in graph G' . The equiv- 
alent random graph G" is used to find the expected num- 
ber of edges from node i to node j. In this graph the edges 
are placed in random subject to constraints: (i) The to- 
tal number of edges in G" is W; (ii) The out-degree of 
node i in G" = out-degree of node i in G' = W° ut ; (Hi) 
The in-degree of a node j in graph G" =in-degree of node 
j in graph G' = W™ . Thus in G" the probability that an 
edge will emanate from a particular node depends only 
on the out-degree of that node; the probability that an 
edge is incident on a particular node depends only on the 
in-degree of that node; and the probabilities of the two 
nodes being the two ends of a single edge are indepen- 
dent of each other. In this case, the probability that an 
edge exists from i to j is given by edge in G' emanates 
from i ■ edge in G' incident on j = (W° ut /W){W* n /W). 
Since the total number of edges is W in G", there- 
fore the expected number of edges between i and j is 
W ■ (W° ut /W){Wp/W) = NC~(a,P,n -> oo), the ex- 
pected the expected a centrality in G. 

Once we compute Q(a), we have to select an algorithm 
to divide the network into communities that maximize 
Q(a). Brandes et al. [29] have shown that the decision 
version of modularity maximization is NP-complete. Like 



others [TH1 [30] , we use the leading eigenvector method to 
obtain an approximate solution. In this method, nodes 
are assigned to either of two groups based on a single 
eigenvector corresponding to the largest positive eigen- 
value of the modularity matrix. This process is repeated 
for each group until modularity does not increase further 
upon division. 



C. Relation to Other Centrality Measures 

We can generalize the centrality metric presented 
above to a notion of path-based connectivity and relate 
it to other centrality metrics. Let q = (qij) be a n x n 
matrix such that g™- is the number of paths of length n 
connecting nodes i and j. The number of paths of length 
one connecting i and j is qfj — ; the number of paths 
of length two is qfj = (A X A)ij, etc. The expected num- 
ber of paths connecting nodes i and j is E(qij) where: 

E(q) - (Wi • q 1 + W 2 ■ q 2 + . . . + W n ■ q 11 + . . .) , 

where can be a scalar or a vector. The proximity 
score E(qij) can be used to find out how connected, or 
close, two nodes are. 

Several path-based centrality metrics can be expressed 
in terms of E(qij), including random walk models [SI [71 
|3"TH3"3"] . degree centrality, Katz score [5], as well as a- 
centrality. In a random walk model, a particle starts 
a random walk at node i, and iteratively transitions to 
its neighbors with probability proportional to the corre- 
sponding edge weight. At each step, the particle returns 
to i with some restart probability (1 — c). The proximity 
score is defined as the steady-state probability r^j that 
the particle will reach node j [35] . 

• If Wk — c fc • D~( k *> where c is a constant and D is 
an n x n matrix with Dij = Y)j—i Aij if i = j and 
otherwise; then, E(qij) reduces to proximity score 
in random walk models [311 132] . 

• If Wk — Hj—xCXj, where the scalar aj is the atten- 
uation factor along the j-th link in the path, then 
E(qij) reduces to a-centrality score from i to j (nor- 
malizing leads to normalized a-centrality) . For ease 
of computation, we have taken a± = /3 and on = a, 
Mi 7^ 1. a-centrality holds for a < 1/Ai (Ai is 
the largest eigenvalue of A). However, as shown in 
the Appendix, normalized a-centrality holds for all 
values of a. 

• When p = a, this in turn reduces to the Katz status 
score [5]. 

• If Wk — a, and adjacency matrix A is symmetric, 
then as ct -> E(q) is proportional to the in- 
ner product of the eigenvector corresponding to Ai, 
with itself. It would lead to eigenvector centrality 
as shown in Appendix. 
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• When Wi = 1 and W k = 0, Vfc > 1, then 
E(qij) is the degree centrality used in modularity- 
maximization approaches [15] . 

We have implemented [32] a simple algorithm to com- 
pute NC(a, /3,n — > oo) using an alternative formulation 
of a-centrality: 

C(a,/3,n+ 1) = /3A + aC(a,/3 1 n)A (4) 

For any given value of a, this method iteratively com- 
putes C(a,/3, n — > oo) and consequently NC(a, (3,n — » 
oo) until convergence. Experimentally we have observed, 
that this method reaches convergence very quickly. Con- 
sidering a network with N nodes and M links, each it- 
eration of this algorithm (for a given value of a) has a 
runtime complexity of O(MN) and space complexity of 
O(M). 

In order to study variation in this metric for 
a < 1/| Ai|, we must choose a step size of order oc 
l/min(d™* x , d%£ ax ), since by the Gershgorin circle the- 
orem, | Ai| < imn(d™t x ,d™ ax ), where d™% x and d™ ax are 
the maximum out- and in-degree of the network respec- 
tively. Since the formulation of normalized a-centrality 
is very similar to that of PageRank, similar block based 
strategies can be used for fast and efficient computation 
of both PageRank and NC(a, 0, n4 oo) [22j El- Like 
PageRank, normalized a-centrality can easily be imple- 
mented using the map- reduce paradigm [24] , guarantee- 
ing the scalability of this algorithm. 

III. EMPIRICAL RESULTS 

We apply the formalism developed above to benchmark 
networks studied in literature and a network extracted 
from the social photosharing site Flickr. 

A. Karate Club Network 

First, we study the friendship network of Zachary's 
karate club [33] shown in Figure [I] During the course of 
the study, a disagreement developed between the admin- 
istrator and the club's instructor, resulting in the divi- 
sion of the club into two factions, represented by circles 
and squares in Figure [T] We find community division of 
this network for < a < 0.1487 (maximum a is given 
by reciprocal of the largest eigenvalue of the adjacency 
matrix). The first bisection of the network results in 
two communities, regardless of the value of a, which are 
identical to the two factions observed by Zachary. How- 
ever, when the algorithm runs to termination (no more 
bisections are possible), different groups are found for 
different values of a. For a — 0, the method reduces 
to edge-based modularity maximization [27] and leads to 
four groups pH [35] (Figure [IJ a)). For < a < 0.14 it 
discovers three groups (Figure [Hb)), and for a > 0.14, 
two groups that are identical to the factions found by 



Zachary (Figure |TJc)) . Thus, increasing a allows local 
groups to merge into more global communities. 

Figure [2] shows how the normalized a-centrality scores 
of nodes change with a. For a = 0, normalized a- 
centrality reproduces the rankings given by degree cen- 
trality. As we show in the appendix, the final rankings 
produced by normalized a-centrality for this symmetric 
matrix are the same as those given by the eigenvector 
centrality. This can be confirmed by their values in Ta- 
ble [T] Varying a allows us to smoothly transition from a 
local to a global measure of centrality. 

Nodes 34 and 1 have the highest centrality scores, es- 
pecially at lower a values. These are the leaders of their 
communities. It was the disagreement between these 
nodes, the club administrator (node 1) and instructor 
(node 34), that led to the club's division. Nodes 33 and 
2 also have high centrality and hold leadership positions. 
All these nodes are also scored highly by betweenness 
centrality and PageRank. Note that centrality scores of 
these nodes decrease with a, indicating that they are far 
more important locally than globally. 

A node may also have high centrality if it is con- 
nected to many nodes from different communities. Such 
nodes, which bridge communities, are crucially impor- 
tant to maintaining cohesiveness and facilitating com- 
munication flow in both human [T0l [IT] and animal [13] 
groups. We can identify these nodes because their nor- 
malized a-centrality increases with a, i.e., they become 
more important as longer paths become more important. 
Centrality of nodes 3, 14, 9, 31, 8, 20, 10, etc., increases 
with a from moderate to relatively high values. While 
most of these nodes are directly connected to both com- 
munities, some are only indirectly connected by longer 
paths. Betweenness centrality of these nodes is low, but 
non-zero. 

Nodes 25, 26 and 17 have low centrality which de- 
creases with a. These are peripheral members. Between- 
ness centrality of 17 is zero, as expected, but 25 and 26 
have scores similar to 31. PageRank scores of these pe- 
ripheral nodes are higher than nodes 21, 22, 23, which 
are connected to central nodes, and comparable to scores 
of the bridging nodes 20 and 31. While both betweenness 
centrality and PageRank correctly pick out leaders, they 
do not distinguish between locally and globally connected 
nodes. 

Guimera and collaborators [l!5] proposed a role-based 
description of complex networks as an alternative to the 
'average description' approach, which characterizes net- 
work structure in terms of average degree or degree dis- 
tribution. They define a role in terms of the relative 
within-community degree z (which measures how well the 
node is connected to other nodes in its community) and 
participation coefficient P (which measures how well the 
node is connected to nodes in other communities). They 
propose a heuristic classification scheme to assign roles 
to nodes based on where they fall in the z-P plane and 
find similar patterns of role-to-role connectivity among 
networks with similar functional needs and growth mech- 
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FIG. 2: Centrality scores of Zachary club members vs. a. 



anisms [20] , 

Figure [3] shows the positions of nodes in the karate club 
network in the z—P plane. Colored regions demarcate 
the boundaries of different roles according to Guimera et 
al.'s classification scheme. Nodes separate into provin- 
cial hubs (34, 1), peripheral (33, 2, 28, 14, 31, 29, 20, 
3, 9, 10) and ultra-peripheral nodes (rest of the nodes). 
No special role is assigned to the bridging nodes, such as 
9. Even if the boundary of non-hub connectors is shifted 
to slightly less than P = 0.5 in order to identify nodes 
3, 9, 10 as serving a special role, the method would still 
miss node 14, whose position in the network is very sim- 
ilar to node 9. This is because the method takes into 
account direct links only, rather than complete connec- 
tivity between nodes. The method also requires one to 
first identify communities in the network, which is a very 
computationally expensive procedure for large networks. 
Our method, on the other hand, uses only matrix mul- 
tiplication and provides a computationally efficient and 



TABLE I: Comparison of eigenvector centrality and converged 
normalized a-centrality for Zachary's karate club network. 



node 


a-cen 


eigenvector 


node 


a-cen 


eigenvector 


34 


0.075 


0.3734 


15 


0.0204 


0.1014 


1 


0.0714 


0.3555 


16 


0.0204 


0.1014 


3 


0.0637 


0.3172 


19 


0.0204 


0.1014 


33 


0.062 


0.3086 


21 


0.0204 


0.1014 


2 


0.0534 


0.266 


23 


0.0204 


0.1014 


9 


0.0457 


0.2274 


18 


0.0186 


0.0924 


14 


0.0455 


0.2265 


22 


0.0186 


0.0924 


4 


0.0424 


0.2112 


13 


0.0169 


0.0843 


32 


0.0384 


0.191 


6 


0.016 


0.0795 


31 


0.0351 


0.1748 


7 


0.016 


0.0795 


8 


0.0343 


0.171 


5 


0.0153 


0.076 


24 


0.0302 


0.1501 


11 


0.0153 


0.076 


20 


0.0297 


0.1479 


27 


0.0152 


0.0756 


30 


0.0271 


0.135 


26 


0.0119 


0.0592 


28 


0.0268 


0.1335 


25 


0.0115 


0.0571 


29 


0.0263 


0.1311 


12 


0.0106 


0.0529 


10 


0.0206 


0.1027 


17 


0.00475 


0.0236 



scalable way to identify network structure. 



B. Florentine Families 

Padgett j36j studied the structure of political and busi- 
ness relationships among the elite families of Renaissance 
Florence. The two rival factions during this period were 
the oligarchs, composed of the patrician families, and the 
Mediceans, who formed close ties with the newly pow- 
erful businessmen, or the "new men." Before the rise 
of Medicis, the oligarchs dominated Florentine politics 
and economics and cemented their power through mar- 
riage. They were less willing to enter into business re- 
lationships with the "new men." The Medicis, on the 
other hand, consolidated their power through business 
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FIG. 3: Classification of karate club nodes according to 
the roles scheme proposed by Guimera et al. [19]: (i) non- 
hubs (z < 2.5) are divided into ultra-peripheral, peripheral, 
and connector nodes (kinless nodes whose links are homoge- 
neously distributed among all communities are not shown); 
(ii) hubs (z > 2.5) are subdivided into provincial (majority of 
link within their own community), connector hubs (many links 
to other communities). Global hubs whose links are homoge- 
neously distributed among all communities are not shown. 



and marriage relationships. Scholars have studied the 
business and marriage networks of Renaissance Florence 
to explain the outcomes of the power struggles between 
the factions and the rise of the Medici family during this 
important period of Western European history. 

We applied our community detection algorithm to 
the heterogenous network containing both marriage and 
business ties shown in Figure [4j The marriage ties are 
shown by straight lines. The dashed lines show the differ- 
ent business relations. The marriage ties are asymmet- 
ric, with the wife-giving family being considered superior 
to the wife-receiving family. All relations are weighted 
equally. We symmetrized the resulting adjacency ma- 
trix by adding it to its transpose. We studied com- 
munity division of this network for values of a in the 
range < a < 0.25, since 0.25 < 1/Ai < 0.26. For 
< a < 0.05 wc found seven distinct groups, shown 
in Fig. |4|a). Two of these are small and disconnected 
from the rest of the network. The first of these groups is 
composed of Guadigni, Fioravanti and Bischeri families 
and the other of Orlandini and Davazati families. The 
three largest groups within the connected component are: 
(1) families aligned with Medici, (2) families aligned with 
the oligarchs, such as Strozzi, Pcruzzi, and (3) mostly oli- 
garch families with split loyalties, like the Alibizzi. Fol- 
lower values of a, the oligarchs are split into two groups 
(Fig.Qa), (b)). The rift within the oligarchs detected by 



our algorithm is corroborated by historic events. When a 
lottery randomly produced too many Medici officeholders 
in the Signoria (1433), Rinaldo Albizzi, the titular head 
of the oligarchs, sent out a word to assemble troops in 
order to forcibly seize Signoria from the Medicis. How- 
ever, his repeated efforts to assemble troops (especially 
from Palla Strozzi) were frustrated by other supporters' 
changing their minds and drifting away [36 , indicating 
factional split within the oligarchs. In contrast, Medicis 
could immediately and effectively mobilize their support- 
ers, as the result of which no military action ensued and 
Cosimo Medici took over the budding Florentine state. 



As we increase the scale of interactions by increasing a, 
the five groups within the connected component gradu- 
ally coalesce into three distinct, as shown in Fig. [4} First, 
the group comprising of Guasconi, Da-Uzzano and Ard- 
inghelli integrate with the Medicis (Fig. [4j[b) ) . When a 
is further increased, the group comprising of Rondinelli, 
Solosmei and Delia Casa integrate with the oligarchs 
(Fig.gc)). 



Figure [5] shows how the normalized a-centrality scores 
of the families in the heterogeneous business-marriage 
network change with a. Guasconi family has the high- 
est centrality score across all values of a. This is not 
surprising given this family's central position in the net- 
work bridging the oligarchs and the Mediceans. This 
observation is corroborated by the findings that cross- 
pressurised (by the Mediceans and the oligarchs) Guas- 
conis were split in their partisan loyalties [36j . Similarly, 
the Medicis, who were able to expertly exploit both busi- 
ness and marriage connections, increase in importance 
as a increases. On the other hand, the oligarchs such 
as Strozzi and Peruzzi families, who were patrician to 
the core and had few business relations outside of their 
faction, see their centrality decrease with a. Thus, the 
historic ascendance of the Mediceans can be observed 
already in the business and marriage networks they cre- 
ated. 



Although the heterogenous network of Florentine fam- 
ilies is not a symmetric network, since it contains asym- 
metric marriage relations, wc find that rankings produced 
by normalized a-centrality for a > l/|Ai| is well cor- 
related with those produced by eigenvector centrality. 
However, as observed by Bonacich [T], for asymmetric 
the marriage network there are important differences be- 
tween the rankings of eigenvector centrality and normal- 
ized a-centrality. For instance, the eigenvector centrality 
scores of Bischeri, Guadigni and Orlandini are zero, even 
though they were wife-giving families. The reason for this 
is their segregation from the large connected component. 
This anomaly is corrected by the normalized a-centrality, 
though the centrality scores for Bischeri, Guadigni and 
Orlandini are very small. 
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(a)0<Q<0.05 (b) 0.05 < a < 0.1 (c) 0.1 < a < 0.25 



FIG. 4: The groups in the Florentine family data set: (a)0 < a < 0.05 (b) 0.05 < a < 0.1 (c)0.1 < a < 0.25 
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FIG. 5: Normalized a-centrality scores of the families within 
the business-marriage network. Some of the nodes are identi- 
fied, with the rest shown in grey. 



C. Other Real- World Networks 

In addition to the social networks described above, we 
evaluated the performance of our community division al- 
gorithm on three other real-world networks: the US Col- 
lege football and the political books networks, as well as 
the social network retrieved from the social photoshar- 
ing site Flickr. We were not able to evaluate rankings 
due to the lack of ground truth for these data sets. The 
first network represents the schedule of Division 1 games 
for the 2001 season where the nodes represent teams and 
the edges represent the regular season games between 
teams [37]. The teams are divided into conferences con- 
taining 8 to 12 teams each. Games are more frequent 
between members of the same conference, though inter- 
conference games also take place. This leads to an intu- 
ition, that the natural communities may be larger than 
conferences. 

The political books network represents books about US 
politics sold by the online bookseller Amazon. 48J Edges 
represent frequent co-purchasing by the same buyers, as 



indicated by the "customers who bought this book also 
bought these other books" feature of Amazon. The nodes 
were labeled liberal, neutral, or conservative by Mark 
Newman on a reading their descriptions and reviews on 
Amazon |49j. We take these labels as communities. 

To collect the final data set, we sampled Flickr's so- 
cial network by identifying roughly 2000 users interested 
in one of three topics: portraiture, wildlife, and technol- 
ogy. We used the Flckr API to perform a tag search 
using relevant keywords to retrieve 500 'most interest- 
ing' images for each topic and extracted the names of 
users who uploaded these images. [50] Further, we iden- 
tified four users (eight for the wildlife topic) who were 
interested in each topic by studying their profiles, specif- 
ically group membership and user's tags. Groups such 
as "Big Cats" , "Zoo" , "The Wildlife Photography" , etc. 
pointed to user's interest in wildlife. In addition, tags 
that users attached to their images could also help iden- 
tify their interests. Users who used nature and macro 
tags were probably interested wildlife rather than tech- 
nology. Similarly, users interested in human, rather than 
animal, portraiture tagged their images with baby and 
family. We then used Flickr API to retrieve these users' 
contacts, as well as their contacts' contacts, and labeled 
all by the topic through which they were discovered. We 
reduced this network to an undirected network of mu- 
tual contacts only, resulting in a network of 5747 users, 
with 1620, 1337 and 2790 users labeled technology, por- 
traiture and wildlife respectively. Although we did not 
verify that all the users were interested in the topics they 
were labeled with, we use these 'soft' labels to evaluate 
the discovered communities. 

We use purity to evaluate the quality of discovered 
communities. We define purity as the fraction of all pairs 
of objects in the same community that are assigned to the 
same group by the algorithm. This is a simplified version 
of the Wallace criterion [38 for evaluating performance 
of clustering algorithms. 

The number and purity of the communities found in 
networks as a function of the parameter a are shown in 
Table [TT] The case a = corresponds to edge-based mod- 
ularity method. As a increases, the number of groups 



TABLE II: The number and purity of communities discovered 
at different values of a 
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discovered in all networks goes down, while their purity 
increases. This is consistent with our hypothesis that us- 
ing smaller values of a allows us to identify more local 
network structure, while larger values of a lead to more 
global structure. In the Karate club network, for exam- 
ple, at a = 0, there are four small communities, as shown 
in Fig. [lja). These local communities coalesce into two 
large groups as a increases (Fig.jljc)), which are identical 
to the groups identified by Zachary [3"4"] . 

To evaluate the communities discovered in the Floren- 
tine families network, we use the tight constraint of party 
loyalty. Hence the families could be either Medicean, oli- 
garch, or have split loyalties. We note that this is a very 
conservative evaluation criterion, since there were fac- 
tions present within the parties themselves |36j . Since 
families with split loyalties would be correctly classified 
as belonging to either of the two parties, we remove them 
from purity calculation, focusing instead on identifying 
community of party loyalists only. Purity is further re- 
duced by the presence of the two isolated groups. How- 
ever, purity of discovered communities increases with a. 
The small local communities found at lower value of a 
could indicate factions within parties. 



IV. RELATED WORK 

A variety of metrics have been proposed to measure 
node's centrality in a network [Tl 151 151 171 151 1551 1551 130] . vet 
few studies systematically evaluated their performance 
on real-world networks. Liben-Nowell and Kleinberg [41] 
compared the performance of several commonly used cen- 
trality metrics on the link prediction task and found Katz 
score [8] to be the most effective measure for this task, 
outperforming PageRank [5] and its variants. The a- 
centrality metric modifies the Katz score by introducing a 
parameter a, that gives a weight to indirect links and also 
sets the length scale of interactions in the network. We 
showed recently [42] that normalized a-centrality outper- 
forms other centrality metrics on the task of predicting 



influential nodes in an online social network. 

Guimera and collaborators [H1[2U] proposed role-based 
description of complex networks. They define a role in 
terms of the relative within-community degree z (which 
measures how well the node is connected to other nodes 
in its community) and participation coefficient P (which 
measures how well the node is connected to nodes in 
other communities). They proposed a heuristic classi- 
fication scheme based on where the nodes lie in the z-P 
plane. This classification scheme is similar to the local vs. 
globally-connected distinction we are making, with con- 
nector nodes being more globally connected nodes while 
provincial hubs and peripheral nodes are more locally 
connected. Role-based analysis requires community de- 
composition of the network to be performed first. This 
is a computationally expensive procedure for most real- 
world networks. Our approach, on the other hand, allows 
us to differentiate between roles of nodes in a more com- 
putationally efficient way. 

Community detection is another active area in net- 
works research (see |14j for a comprehensive review). 
Like us, Arenas et al. [13] have generalized modularity 
to find correlations between nodes that go beyond near- 
est neighbors. Their approach relies on the presence of 
motifs [T71 [TS], i.e., connected subgraphs such as cycles, 
to identify communities within a network. For example, 
higher than expected density of triangles implies pres- 
ence of a community, and a triangle modularity may be 
defined to identify it. The motif-based modularity uses 
the size of the motif to impose a limit on the proximity 
of neighbors. Our method, on the other hand, imposes 
no such limit. The measure of global correlation com- 
puted using a-centrality is equal to the weighted average 
of correlations for motifs of different sizes. Our method 
enables us to easily calculate this complex term. 



V. CONCLUSION 

In this paper, we introduced normalized a-centrality 
as a metric to study network structure. Like the original 
a-centrality [1] on which it is based, this metric mea- 
sures the number of paths that exist between nodes in 
a network, attenuated by their length with the attenua- 
tion parameter a. This parameter sets the length scale of 
the interaction. When a = 0, the centrality metric takes 
into account direct edges only and is equivalent to degree 
centrality. As a increases, the metric takes into consid- 
eration more distant network interactions, becoming a 
more global measure. Normalized a-centrality allows us 
to smoothly interpolate between local metrics, such as 
degree centrality, and global metrics, such as eigenvec- 
tor centrality [T]. Unlike the original a-centrality, which 
bounds a to be less than the reciprocal of the largest 
eigenvalue of the adjacency metric of the network, nor- 
malized a-centrality sets no such limit. 

We used normalized a-centrality to study the struc- 
ture of networks, specifically, identify important nodes 
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and communities within the network. We extended the 
modularity maximization class of algorithms |37j to use 
(normalized) a-centrality, rather than edge density, as 
a measure of network connectivity. For small values of 
a smaller, more locally connected communities emerge, 
while for larger values of a, we observe larger globally 
connected communities. We also used this metric to rank 
nodes in a network. By studying changes in rankings that 
occur when parameter a is varied, we were able to iden- 
tify locally important 'leaders' and globally important 
'bridges' or 'brokers' that facilitate communication be- 
tween different communities. We applied this approach 
to benchmark networks studied in literature and found 
that it results in network division in close agreement with 
the ground truth. We can easily extend this definition to 
multi-modal networks that link entities of different types, 
and use approach described in this paper to study the 
structure of such networks 144 . 
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Appendix A: Proofs of Convergence 

In this section we prove some properties of the nor- 
malized a-centrality metric proposed in this paper. The 
a-centrality matrix C(a,(3,n) Va £ [0, 1] is defined as: 

2 A2 



C(a,/3,n) = pA(I + aA + a 2 A 2 + •■■ +a n A n ) 



(3AY^a k A k 



(Al) 



fe=0 



The normalized a-centrality matrix is then given by 

C(a, /?, n) 



NC{a,(3,n) = 



(A2) 



l -3 



The normalized a-centrality vector is NCi(f3,a,n — > 
oo) = eNC(a, (3, n — > oo) where e is a 1 x N unit vector 
and TV is the number of nodes in the network. 
If A is an eigenvalue of A, then 



(I - \A)x = 

A 



(A3) 



Invertibility of (I — \A) would lead to the trivial solu- 
tion of eigenvector x (x = 0). Hence for computation of 
eigenvalues and eigenvectors, we require that no inverse 
of (/ — \A) should exist, i.e. 



Det(I- —A) = 0. 

A 



(A4) 



Equation ( A4 1 is called the characteristic equation, solv- 



ing which gives the eigenvalues and eigenvectors of the 
adjacency matrix A. 

The adjacency matrix A can be written as: 



N 



A = AAA" 1 = J2 X * Y * 



(A5) 



where A is a matrix whose columns are the eigenvectors 
of A, and A is a diagonal matrix whose diagonal elements 
are the eigenvalues of A, An = A,, arranged according to 
the ordering of the eigenvectors in X. Without loss of 
generality we assume that Ai > A2 > ••• > \n- The 
matrices Yi can be determined from the product 

Y t = XZ.X- 1 (A6) 

where Z± is the selection matrix having zeros everywhere 
except for element {Zi) u = 1 [45] . 

Adjacency matrix A raised to the power k is then given 
by 



A? 



A k =XA k X- 1 =J2 X i y i 



(A7) 



Using Equation (A7), Al reduces to 



N n 



C(a,f3,n) = (3Aj2J2 akX " Y i 



t=ifc=i 

N 



i3aJ2k 



(-lf'(l-a" +1 A^ +1 ) 
(-lni-aX,) 



Y, 



(A8) 

where Pi = if a|Aj| < 1, and Pi = 1 if a|Aj| > 1. 
For the equations |A1| and |A8| to hold non-trivially, a ^ 
l/|Ai|, Vie 1,2..- ,N. 

We characterize the series {NC{a 1 j3, n — > 00)} for a € 
[0, 1] as follows: 

1. a <C m^t: If a <S! r^-r , C(a,(3,n — > 00) (and 
NC(a, P,n — ¥ 00) ) would be independent of a, 
since 



C(a,/3,n 
NC(a,f3,n -> 00) 



00) w j3 A 
A 



(A9) 



2. a < p^-: The sequence of matrices {C(a,(3, n)} 
converges to C(a, /3) as n — > 00 if all the sequences 

fixe 
1 

fAIp 

C(a,f3,n^ 00) = PAY* X \ Y, = (3A(I - aAY 1 
- C(a,(3) 



{(C(a, /8, n))^} for every fixed i and j converge to 
{C(a, P))ij [46] . If a < j^j, C(a,/3,n) converges 
to C{a,P). 

N 



JVC(a,|3,ii-> 00) = 



(A10) 
(All) 
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3. a > r^j-r and n — > oo, (3a n A n+1 dominates in the 



Equation (A8) 



C(a,P,n^ oo) w 
NC(a,p,n oo) 



E^r 1 



(A12) 



Theorem 1 TTie induced ordering of nodes due to nor- 
malized a-centrality is equal to the induced ordering of 
nodes due to a-centrality for a < l/|Ai|. 

Proof. Since centrality score due to a-centrality is 
eC(a, P, n — ¥ oo) and that due to normalized a-centrality 



is eNC(a, j3,n —> oo), from equations A9 and All 
the induced ordering of nodes due to a-centrality (a < 
1/| Ai|) would be equal to induced ordering of nodes due 
to normalized a-centrality (a < l/|Ai|). 

Theorem 2 The value of normalized a-centrality ma- 
trix remains the same Va G (l/|Ai|,l] (NC{a > 
l/\\i\,P,n^Kx>)=NC(P,n^oo)). 



Proof. As can be seen from equation |A12| when a > 
1/| Ai | and n — > oo, NC(a > l/|Ai|,/3,n — > oo) reduces 
to A n+1 /J2 A 'lj +1 = NC(f3,n^ oo) and is independent 

of a. 

The remaining theorems hold under the condition that 
| Ai| is strictly greater than any other eigenvalue, which 
is true in most real life cases studied. 

Theorem 3 limQ-vi/i^i JVCi(a:, /3, 7i — > oo) exists and 
lim^i/i;^ | NCi(a,f3,n ->• oo) = NCi(f3,n -> oo) = 
NC^a > l/|Ai|,/?,rw oo) = eAYyf£ id {AY x ) ir 

Proof. Under the assumption that |Ai| is strictly 
greater than any eigenvalue, as a — > 1/|A^|, Equa- 
tion (All) reduces to 



C(a-> l/\\ x \,P,n^-oo) 



pXi 



1 — aAi 



AYt (A13) 



This is because all other eigenvectors shrink in impor- 
tance as a — ► 1/|A^~| pp. Therefore, as a — > 1/\X X \ , we 
have 



NC{a^ l/|A^|,/3,n->oo) 



AY, 



E (^Oi 



(A14) 



Under the assumption that |Ai| is strictly greater than 
any other eigenvalue, Pa n X"AYi dominates in the Equa- 



tion dA8l, A12 



oo 



C(a-H/\Xj;\,p,n 
NC(a -> l/\\+\,(3,n^ oo) 



pa n \lAY 1 
AY! 



(A15) 



Hence from equation A15 as a —> 1/|A^|, we have 

AYi 



NC(a^ VW|,/3,n->oo) 



Since, 



(A16) 



lim NC(a, f3, n —> oo) = lim NC(a, P, n — >• oo) 
a->l/|Af| «-^i/|A+| 

_ 

therefore, the limit lim Q _;.i/| Al | NC(a, p, n — » oo) exists 
and 

All 

= AC(^,rw oo) 

Since NCi(a,f3,n —> oo) = eNC(a, P,n —> oo), there- 
fore, limc^x/lAil NCi(a, j3, n — > oo) = NCi(f3,n — ► 
oo) = ACi(a > l/|Ai|,/3,n-> oo) = eA^/E^ (^i) <3 - 

Theorem 4 For symmetric matrices, the induced order- 
ing of nodes due to eigenvector centrality Ce is equiva- 
lent to the induced ordering of nodes given by normalized 
centrality NC'i(P,n — > oo) = lim a _^i/i Al i NCi(a, P,n — > 
oo) = NC t (a > l/|Ai|,/3,n^ oo) = eAY 1 /J2 i j {AY 1 ) ij . 



Proof. For symmetric matrices 

A = XAX- 1 = XAX T 
Therefore equation |A6| reduces to 



Yi — XZ t X — X i X i 



(A17) 



(A18) 



where Xi is the column of X representing the eigenvec- 
tor corresponding to A^. Hence, in case of symmetric 
matrices: 



NCi(P,n^Kx>) = NC i (a>l/\\ 1 \,p,n-Kx>) 

im JS 
l/|Ai| 

eAYi 



= lim NCi(a, ft, n — > oo) 

a-)-l/|Ai| 



= c Y eAX x Xl = c 2 Xf (A19) 

where ci = ^ — A v s and c? = CieAXi. 

Since X, corresponds to the eigenvector central- 
ity vector Ce, hence for symmetric matrices, the in- 
duced ordering of nodes given by eigenvector cen- 
trality Ce is equivalent to the induced ordering of 
nodes given by normalized centrality NCi(P,n — > 
oo) = lim^i/i Al | NCi(a, p, n -> oo) = Nd(a > 
l/\\i\,P,n oo) = eAYi/J2ij (AY^. 
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